Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import pandas as pd | |
| def show_back_button(previous_page): | |
| if st.button("Back"): | |
| previous_page() | |
| st.title(":rainbow[Life Cycle of a Machine Learning Project]") | |
| st.write(""" | |
| Welcome to an easy-to-follow guide on the **Steps of a Machine Learning Project**! | |
| Click on each step to learn what happens in a way that's simple to understand. | |
| """) | |
| st.divider() | |
| if st.button("Problem Statement π§ "): | |
| st.write("The first thing we need to do is figure out exactly what problem we're trying to solve. It's like knowing what kind of dish we want to cook before we start. Do we want to predict house prices or figure out if an email is spam?", | |
| "For example, if we want to predict the price of a house, the question might be: 'How can we guess the price of a house based on its size, age, and location?'") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| # Define state for showing nested buttons | |
| if "show_nested" not in st.session_state: | |
| st.session_state.show_nested = False | |
| # Primary button | |
| if st.button("Collecting Data π"): | |
| st.session_state.show_nested = not st.session_state.show_nested | |
| st.markdown(""" | |
| Once we know the problem, we gather the data we need to help solve it. It's like gathering all the ingredients before cooking. The more accurate the ingredients (data), the better our final result., | |
| For our house price example, we might gather data about the size of houses, their number of rooms, location, and age. We can get this from websites like Zillow or public datasets. | |
| """) | |
| st.title("π Data Collection Exploration π") | |
| st.header("What is Data? π€ Let's Find Out!") | |
| st.subheader(":rainbow[Understanding Data]π") | |
| st.markdown(""" | |
| **Data** is all around us! It's simply information collected from different sources to help us analyze, predict, and make decisions. | |
| It can exist in various forms, and data collection is crucial in many industries today, such as business, healthcare, research, and beyond. | |
| **Examples of Data**: | |
| - πΈ **Images** | |
| - π **Text** | |
| - π₯ **Videos** | |
| - πΆ **Audio** | |
| - π·οΈ **Sensor Data** | |
| """) | |
| st.subheader(":green-background[Types of data:]") | |
| st.markdown("Data based on structure data can be classified into three types as given below:") | |
| st.subheader(":blue[Structured Data]π:") | |
| st.markdown(""" | |
| **Structured data** is well-organized and stored in a tabular format, like rows and columns. It's easy to search, analyze, and process. | |
| Some popular examples include: | |
| - π **Excel Files** (.xlsx, .xls) | |
| - π» **SQL Databases** (MySQL, PostgreSQL) | |
| **Why Structured Data?** | |
| - β Easy to manage and query. | |
| - β Can be stored in databases and spreadsheets. | |
| - β Facilitates fast analysis and reporting. | |
| """) | |
| st.image("https://www.w3schools.com/datascience/img_structured_data.png") | |
| if st.session_state.show_nested: | |
| if st.button(":rainbow[Excel]π"): | |
| st.markdown("Excel files store data in a tabular format with rows and columns. They are widely used in business and analysis.") | |
| st.subheader("Handling of xlsx files") | |
| st.markdown(":rainbow[How to read excel files?]") | |
| code = """ | |
| import pandas as pd | |
| df = pd.read_excel("your_file.xlsx") | |
| print(df.head())""" | |
| st.code(code,language="python") | |
| st.markdown(":rainbow[**Issues that encountered while handling excel files:**]") | |
| st.markdown(":grey-background[**1.Parser Error**:]Parser errors occur when there is an issue with reading or interpreting the data in a file.") | |
| st.markdown(":rainbow[we can overcome this error as follows:]") | |
| code = """ | |
| import pandas as pd | |
| pd.read_excel(r"your_file.xlsx",on_bad_lines = "skip") ### Default will be "error" | |
| """ | |
| st.code(code,language = "python") | |
| st.markdown(":grey-background[**2.Encoding Error**:]Encoding errors occur when there is a mismatch between the actual encoding of a file and the encoding expected by the software trying to read it.") | |
| st.markdown(":rainbow[we can solve encoding errors by try and except blocks]") | |
| code = """ | |
| l = ["utf-8","latin","utf-16"] | |
| for y in l: | |
| try: | |
| pd.read_excel(r"your_file.xlsx",encoding=y) | |
| print("{} is correct encoding".format(y)) | |
| except UnicodeDecodeError: | |
| print("{} is not correct encoding".format(y)) | |
| """ | |
| st.code(code,language = "python") | |
| st.markdown(":grey-background[**3.Out Of Memory**:]Out of memory errors occur when a program tries to use more memory than is available on the system.") | |
| st.markdown(":rainbow[we can overcome this error as given below]") | |
| code = """ | |
| files = pd.read_excel(r"your_file.xlsx",encoding = "latin",chunksize=100) | |
| c = 0 | |
| for chunk in files: | |
| print(chunk.shape) | |
| print("*"*50) | |
| c+=1 | |
| """ | |
| st.code(code,language="python") | |
| st.markdown(":grey-background[**4.Takes long time to load a huge data set**:]Loading a huge dataset can take a long time due to several factors like file size,disk I/O speed,memory contraints etc...") | |
| st.markdown(":rainbow[we can overcome this issue by using polars,this is same as pandas]") | |
| code = """ | |
| import polars | |
| x = time.time() | |
| data2 = polars.read_excel(r"your_file.xlsx") | |
| y = time.time() | |
| """ | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":rainbow[SQL Data Format]π» "): | |
| st.markdown("**SQL** (Structured Query Language) is used to query and manage data in relational databases.") | |
| st.write(":rainbow[**Python Code to Read SQL Data**:]") | |
| code = """ | |
| import pandas as pd | |
| from sqlalchemy import create_engine | |
| engine = create_engine("sqlite:///your_database.db") | |
| df = pd.read_sql("SELECT * FROM your_table", engine) | |
| print(df.head()) | |
| """ | |
| st.code(code,language="python") | |
| st.write(":grey-background[**SQL Query Example**:]") | |
| code = """sql | |
| -- Query to find the average salary of employees | |
| SELECT department, AVG(salary) FROM employees GROUP BY department; | |
| """ | |
| st.code(code,language="python") | |
| st.write(":grey-background[**Common SQL Issues**:]") | |
| st.write(""" | |
| - **Slow Queries**: Optimize by using indexes and indexing. | |
| - **Data Integrity**: Ensure consistency by using primary and foreign keys. | |
| """) | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| st.subheader(":rainbow-background[Unstructured Data]:") | |
| st.markdown("Unstructured data refers to information that doesn't have a predefined data model or isn't organized in a specific manner. Unlike structured data, which is neatly organized in tables and columns, unstructured data is more free-form and can come in various formats.") | |
| st.image("https://www.analytixlabs.co.in/blog/wp-content/uploads/2022/11/Unstructured-Data.png") | |
| st.markdown(":rainbow[unstructured data may come from:]") | |
| st.markdown(":blue-background[**Text Documents**:] Emails, Word documents, PDFs, and other text files.") | |
| st.markdown(":blue-background[**Multimedia Files**:] `Images, videos, and audio files`.") | |
| st.markdown(":blue-background[**Social Media Content**:] Posts, comments, tweets, and messages from platforms like Facebook, Twitter, and Instagram.") | |
| st.markdown(":blue-background[**Web Pages**:] HTML content from websites.") | |
| st.markdown(":blue-background[**Sensor Data**:] Data from IoT devices that may not follow a specific structure.") | |
| st.markdown(":blue-background[**Logs**:] System logs, server logs, and application logs.") | |
| if st.session_state.show_nested: | |
| st.subheader("Different types of Unstructured Data :") | |
| if st.button(":rainbow[IMAGE]πΌοΈ"): | |
| st.subheader(":rainbow[Image]") | |
| st.markdown(":green[What is an image ?]") | |
| st.markdown("Simply it is a 2D representation of a Visible Spectrum.It is a grid like structure image is essentially a grid of tiny squares called pixels. Each pixel represents a single point in the image and contains color information.") | |
| st.markdown("β’**Pixels:** The smallest unit of an image, each pixel holds a color value. In a digital image, these colors are typically represented using combinations of red, green, and blue (RGB).") | |
| st.markdown("β’**Resolution:** The number of pixels in an image determines its resolution.Higher resolution means more pixels, which generally results in a clearer and more detailed image.") | |
| st.markdown("**Example:** 2000x1040 image has pixels horizantally and 1040 pixels vertically,total of 2,080,000 pixels.") | |
| st.markdown(":green[How an Image is formed ?]") | |
| st.markdown("An image can indeed be formed through a source of light,The formation of an image is a process where light or other forms of energy are captured and represented visually.The specifics depend on the medium (e.g.,human eye,camera,microscope).") | |
| st.subheader(":green[What do we mean by ColourSpaces in OpenCV?]") | |
| st.markdown("a color space refers to a specific organization of colors, which is a way to represent and interpret colors in images. OpenCV provides a wide range of color spaces and functions to convert images between these color spaces.") | |
| st.subheader(":green[Common Colourspaces Available in OpenCV:]") | |
| st.markdown("**BGR (Blue-Green-Red):**") | |
| st.markdown("β’Default color space for images loaded in OpenCV.") | |
| st.markdown("β’Not to be confused with RGB, as OpenCV stores color channels in the order Blue, Green, and Red.") | |
| st.markdown("**RGB (Red-Green-Blue):**") | |
| st.markdown("β’Widely used in digital displays and image processing.") | |
| st.markdown("β’Conversion from BGR is necessary if you're using OpenCV's loaded image for RGB-based operations.") | |
| st.markdown("**Grayscale (Single Channel):**") | |
| st.markdown("β’Intensity-only representation.") | |
| st.markdown("β’Commonly used for image processing tasks like edge detection and thresholding.") | |
| if st.button(":grey-background[Basic Operations in OpenCV]"): | |
| st.markdown(":green[cv2.imread():]") | |
| st.markdown("To convert 2d image to a array.") | |
| code = ''' | |
| import cv2 | |
| # Read an image | |
| image = cv2.imread('image.jpg') | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.imshow():]") | |
| st.markdown("-->To display a array into a image.") | |
| st.markdown("-->Creates a pop-up window.") | |
| st.markdown("-->Takes the array and displays the array in the pop-up window.") | |
| code = ''' | |
| import numpy as np | |
| white_img = np.full((500,500),255,dtype = np.uint8) | |
| black_img = np.zeros((500,500),dtype = np.uint8) | |
| cv2.imshow("white",white_img) | |
| cv2.imshow("black",black_img) | |
| cv2.waitKey() | |
| cv2.destroyAllwindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.waitkey():]") | |
| st.markdown("To add a delay i.e how much milliseconds the window will be activated in the screen.") | |
| code = ''' | |
| import cv2 | |
| # Wait for a key event | |
| cv2.waitKey(0) # 0 and no value means imfinite delay to close x button | |
| # 1 or any number it will delay that much milliseconds to close x button | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.destroyAllWindow():]") | |
| st.markdown("To distroy the window.") | |
| code = ''' | |
| import cv2 | |
| # Close all OpenCV windows | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.merge():]") | |
| st.markdown("To merge the image colour space.") | |
| code = ''' | |
| import cv2 | |
| import numpy as np | |
| # Create single-channel images | |
| blue = np.zeros((300, 300), dtype='uint8') | |
| green = np.zeros((300, 300), dtype='uint8') | |
| red = np.zeros((300, 300), dtype='uint8') | |
| # Merge channels into a color image | |
| merged_image = cv2.merge([blue, green, red]) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.resize():]") | |
| st.markdown("To resize the array into required format.") | |
| code = ''' | |
| import cv2 | |
| # Resize the image | |
| resized_image = cv2.resize(image, (width, height)) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[cv2.imwrite():]") | |
| st.markdown("To convert array into image.") | |
| code = ''' | |
| import cv2 | |
| # Write the image to a file | |
| cv2.imwrite('output.jpg', image) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[flatten()]") | |
| st.markdown("The flatten() method in image processing is used to convert a multi-dimensional array (such as an image) into a one-dimensional array.") | |
| code = ''' | |
| import cv2 | |
| img = img.flatten() | |
| ''' | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":grey-background[Convertion of ColourSpaces:]"): | |
| st.markdown("**:red[1.cv2.COLOR_BGR2RGB:]**") | |
| st.markdown("It will converts BGR to RGB.") | |
| code = ''' | |
| import cv2 | |
| # Read the image in BGR format | |
| image_bgr = cv2.imread('example.jpg') | |
| # Convert the image to RGB format | |
| image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB) | |
| # Display the BGR and RGB images using OpenCV | |
| cv2.imshow('BGR Image', image_bgr) | |
| cv2.imshow('RGB Image', image_rgb) | |
| # Wait for a key press and close the windows | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**:red[2.cv2.COLOR_BGR2GRAY:]**") | |
| st.markdown("It will converts BGR to Grayscale") | |
| code = ''' | |
| import cv2 | |
| # Read the image in BGR format | |
| image_bgr = cv2.imread('example.jpg') | |
| # Convert the image to grayscale | |
| image_gray = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2GRAY) | |
| # Display the original and grayscale images | |
| cv2.imshow('BGR Image', image_bgr) | |
| cv2.imshow('Grayscale Image', image_gray) | |
| # Wait for a key press and close the windows | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**:red[3.cv2.COLOR_GRAY2BGR:]**") | |
| st.markdown("It will converts Grayscale to BGR.") | |
| code = ''' | |
| import cv2 | |
| # Read the image in BGR format | |
| image_bgr_original = cv2.imread('example.jpg') | |
| # Convert the BGR image to grayscale | |
| image_gray = cv2.cvtColor(image_bgr_original, cv2.COLOR_BGR2GRAY) | |
| # Convert the grayscale image back to BGR format | |
| image_bgr = cv2.cvtColor(image_gray, cv2.COLOR_GRAY2BGR) | |
| # Display the original BGR, grayscale, and converted BGR images | |
| cv2.imshow('Original BGR Image', image_bgr_original) | |
| cv2.imshow('Grayscale Image', image_gray) | |
| cv2.imshow('Converted BGR Image', image_bgr) | |
| # Wait for a key press and close the windows | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("`cv2.cvtColor` is a versatile function in OpenCV that is used to convert images from one color space to another. It is a powerful tool for tasks like changing the image's color space (e.g., BGR to RGB, grayscale,etc...) and performing channel manipulation.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":grey-background[Split and Merge]"): | |
| st.subheader("How to Split and Merge Any coloured and grayscale image?") | |
| st.subheader("**:green[Split]**") | |
| st.markdown(":red[Splitting a Color Image into Channels:]") | |
| st.markdown("A color image typically has three channels: :red[Red], :green[Green], and :blue[Blue] (RGB).") | |
| code = ''' | |
| import cv2 | |
| # Read a color image | |
| image = cv2.imread('color_image.jpg') | |
| # Split the image into B, G, R channels | |
| b, g, r = cv2.split(image) | |
| # Display or save individual channels | |
| cv2.imshow('Blue Channel', b) | |
| cv2.imshow('Green Channel', g) | |
| cv2.imshow('Red Channel', r) | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":red[Splitting a Grayscale Image]") | |
| st.markdown("Grayscale images have only one channel. To `split` such an image, simply treat the entire image as a single channel.") | |
| code = ''' | |
| # A grayscale image essentially is a single channel | |
| grayscale_image = cv2.imread('grayscale_image.jpg', cv2.IMREAD_GRAYSCALE) | |
| # Mimicking splitting, just to handle it like a channel | |
| single_channel = grayscale_image.copy() | |
| ''' | |
| st.code(code,language="python") | |
| st.subheader("**:green[Merge]**") | |
| st.markdown(":red[Merging Channels to Create a Color Image]") | |
| st.markdown("Combine the channels back into a single image.") | |
| code = ''' | |
| # Merge the channels back | |
| merged_image = cv2.merge((b, g, r)) | |
| cv2.imshow('Merged Image', merged_image) | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":red[Merging Color and Grayscale]") | |
| st.markdown("If you want to merge a grayscale image with a color image (e.g.,replacing one channel of the color image).") | |
| code = ''' | |
| # Replace the blue channel with grayscale | |
| b, g, r = cv2.split(image) | |
| merged_with_grayscale = cv2.merge((grayscale_image, g, r)) | |
| cv2.imshow('Merged with Grayscale', merged_with_grayscale) | |
| cv2.waitKey(0) | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":grey-background[Transformation]"): | |
| st.subheader(":green[Affine Transformation]") | |
| st.markdown("Affine transformations preserve points, straight lines, and planes. Parallel lines remain parallel after an affine transformation.This will be used usually in applications like `Image Augmentation`.") | |
| code = ''' | |
| # Define the transformation matrix | |
| matrix = np.float32([[1, 0, 50], [0, 1, 50]]) | |
| ''' | |
| st.code(code,language="python") | |
| st.subheader(":green[Transformations:]") | |
| st.markdown("* **Translation**") | |
| st.markdown("* **Rotation**") | |
| st.markdown("* **Scaling**") | |
| st.markdown("* **Shearing**") | |
| st.markdown("* **Cropping**") | |
| st.image("https://assets.datacamp.com/production/repositories/2085/datasets/e807bc1ad34e35ac264fd494ab24bae2a8c3a12b/Ch4_L3_Transformations.png") | |
| st.markdown("**<u>Translation Matrix</u>:**",unsafe_allow_html=True) | |
| st.markdown("* Translation matrix is used to shift an image from one location to another.") | |
| st.markdown("* It moves every point of the image by the same amount in a specified direction.") | |
| code = ''' | |
| # Translation matrix | |
| tx = 50 | |
| ty = 0 | |
| t_m = np.array([[1,0,tx],[0,1,ty]],dtype = np.float32) | |
| t_img = cv2.warpAffine(img,t_m,(2100, 2000),borderMode = cv2.BORDER_CONSTANT,borderValue = (255,255,0)) # grey | |
| t_img1 = cv2.warpAffine(img,t_m,(2100, 2000),borderMode = cv2.BORDER_REFLECT) | |
| cv2.imshow("o_i",img) | |
| cv2.imshow("t_i",t_img) | |
| cv2.imshow("t_i1",t_img1) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**<u>Rotation</u>:**",unsafe_allow_html=True) | |
| st.markdown("* A rotation matrix is used to rotate an image around a specific point.") | |
| code = ''' | |
| # Rotation matrix | |
| r_m = cv2.getRotationMatrix2D((1050,1000),60,1) | |
| # Using cv2.warpAffine to apply rotation to the image. | |
| r_img = cv2.warpAffine(img,r_m,(2100, 2000),borderMode = cv2.BORDER_REFLECT) | |
| # To Display the original and rotated image | |
| cv2.imshow("o_i",img) | |
| cv2.imshow("r_i",r_img) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**<u>Scaling(Zoom In or Zoom Out)</u>:**",unsafe_allow_html=True) | |
| st.markdown("* A scaling matrix is used to resize an image.") | |
| st.markdown("* It changes the size of the image by scaling it along the x-axis and y-axis.") | |
| code = ''' | |
| # Method1 | |
| sx = 2 # sx and sy are the scaling factors along the x-axis and y-axis, respectively. | |
| sy = 2 | |
| tx = 0 | |
| ty = 0 | |
| sc_m = np.array([[sx,0,tx],[0,sy,ty]],dtype = np.float32) | |
| scl_img = cv2.warpAffine(img,sc_m,(2*2100,2*2000)) | |
| cv2.imshow("o_i",img) | |
| cv2.imshow("scl_img",scl_img) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| code = ''' | |
| # Method2 | |
| cv2.imshow("o_i",cv2.resize(img,(500,500))) | |
| cv2.imshow("s_i",cv2.resize(img,(100,100))) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| code = ''' | |
| # Method2 | |
| zo_o = cv2.pyrDown(img) # not much used | |
| zo_i = cv2.pyrUp(img) | |
| cv2.imshow("i_o",zo_o) | |
| cv2.imshow("s_i",zo_i) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**<u>Shearing</u>:**",unsafe_allow_html=True) | |
| st.markdown("* Shearing is a type of affine transformation that slants the shape of an image.") | |
| st.markdown("* It shifts the pixels of an image in a specific direction, creating a skewed effect.") | |
| st.markdown("* Shearing can be applied horizontally or vertically.") | |
| code = ''' | |
| shx = 3 # π βπ₯ is the shear factor along the x-axis. | |
| shy = 0 # π βπ¦ is the shear factor along the y-axis. | |
| tx = 0 | |
| ty = 0 | |
| shr_m = np.array([[1,shx,tx],[shy,1,ty]],dtype = np.float32) | |
| shr_img = cv2.warpAffine(img,shr_m,(2*2100,2*2000)) | |
| cv2.imshow("o_i",img) | |
| cv2.imshow("shr_img",scl_img) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**<u>Cropping</u>:**",unsafe_allow_html=True) | |
| st.markdown("* Cropping is the extraction of a specific region of interest (ROI) from an image after applying geometric transformations.") | |
| st.markdown("* Ensures only the relevant portion of the transformed image is retained.") | |
| code = ''' | |
| img = cv2.imread("image.jpg") | |
| img.shape | |
| cr_img = img[39:1,50:130] # required portion extraction through indexing | |
| cv2.imshow("o_i",img) | |
| cv2.imshow("cro_img",cr_img) | |
| cv2.waitKey() | |
| cv2.destroyAllWindows | |
| ''' | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":rainbow[VIDEO]π₯"): | |
| st.subheader(":green[Video]") | |
| st.markdown("* Video handling in OpenCV refers to the process of capturing, processing, displaying, and saving video frames using OpenCV's tools.") | |
| st.markdown("* It enables operations such as reading video files, accessing live camera feeds, applying transformations to video frames, and writing processed frames to new files.") | |
| st.markdown(":red[How to play a Video in OpenCV?]") | |
| st.markdown("**<u>Capturing Video</u>:**",unsafe_allow_html=True) | |
| st.markdown("* OpenCV provides `cv2.VideoCapture()` to read video files or capture live video from a webcam.") | |
| code = ''' | |
| vid = cv2.VideoCapture('video.mp4') # For video file | |
| vid1 = cv2.VideoCapture(0) # Capturing video through webcamera by live | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**<u>Displaying Video</u>:**",unsafe_allow_html=True) | |
| st.markdown("* Frames can be displayed using `cv2.imshow()` in real-time.") | |
| st.markdown(":red[How to change any video / image bgr--->gray?]") | |
| code = ''' | |
| vid = cv2.VideoCapture("video.mp4") | |
| while True: | |
| succ,img = vid.read() | |
| if succ == False: | |
| break | |
| img1 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) | |
| cv2.imshow("video_color",img) | |
| cv2.imshow("video_gray",img1) | |
| if cv2.waitKey(3) & 255 == ord("A"): | |
| break | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":red[How to Split the video into BGR and Grayscale Colour Space?]") | |
| code = ''' | |
| import numpy as np | |
| vid = cv2.VideoCapture(video.mp4") | |
| while True: | |
| succ,img = vid.read() | |
| if succ == False: | |
| break | |
| b,g,r = cv2.split(img) | |
| z = np.zeros(b.shape,dtype = np.uint8) | |
| img1 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) | |
| cv2.imshow("blue_channel",cv2.merge([b,z,z])) | |
| cv2.imshow("green_channel",cv2.merge([z,g,z])) | |
| cv2.imshow("red_channel",cv2.merge([z,z,r])) | |
| cv2.imshow("video_gray",img1) | |
| if cv2.waitKey(3) & 255 == ord("A"): | |
| break | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**:red[Showing BGR and Grayscale Colour Space at a time by live stream:]**") | |
| code = ''' | |
| import numpy as np | |
| vid = cv2.VideoCapture(0) | |
| while True: | |
| succ,img = vid.read() | |
| if succ == False: | |
| print("web camera is not opening") | |
| break | |
| img1 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) | |
| cv2.imshow("video_gray",img1) | |
| cv2.imshow("Live Stream",img) | |
| if cv2.waitKey(1) & 255 == ord("w"): | |
| break | |
| cv2.destroyAllWindows() | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**:rainbow[Explanation:]**") | |
| st.markdown("**1.Capture Video:**") | |
| st.markdown("* `cv2.VideoCapture(0)` initializes the webcam. The parameter 0 typically refers to the default webcam.") | |
| st.markdown("**2.Infinite Loop:**") | |
| st.markdown("* **Read Frames:** `vid.read()` reads a single frame from the webcam.") | |
| st.markdown("* `succ` indicates if the frame was successfully read.") | |
| st.markdown("* `img` contains the frame data.") | |
| st.markdown("* **Error Handling:** If the frame isn't captured (succ == False), an error message is printed, and the loop breaks.") | |
| st.markdown("**3.Grayscale Conversion:**") | |
| st.markdown("* `cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)` converts the color image to grayscale.") | |
| st.markdown("* **Display Windows:**") | |
| st.markdown("* `cv2.imshow('video_gray', img1)` displays the grayscale video feed.") | |
| st.markdown("* `cv2.imshow('Live Stream', img)` displays the colored video feed.") | |
| st.markdown("* **Exit Condition:**") | |
| st.markdown("* The loop exits when the user presses the 'w' key. This is detected by cv2.waitKey(1) & 255 == ord('w').") | |
| st.markdown("* **Clean Up:**") | |
| st.markdown("* `cv2.destroyAllWindows()` closes all OpenCV windows once the loop ends.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| # Nested buttons, conditionally rendered | |
| if st.session_state.show_nested: | |
| st.subheader(":rainbow[Semi-Structured Data]") | |
| st.markdown("Semi-structured data is a type of data that does not conform to a rigid structure like structured data but still contains tags or markers to separate data elements. This makes it more flexible than structured data while still retaining some level of organization.") | |
| st.subheader("Here are the different types of semi structured files:") | |
| if st.button(":rainbow[XML]π"): | |
| st.subheader(":rainbow-background[XML]") | |
| st.markdown("XML (Extensible Markup Language) is a flexible text format used to store and transport data. It is both human-readable and machine-readable, making it a popular choice for data interchange between systems.") | |
| st.markdown("β`.xml` is the extension of xml file.") | |
| st.markdown("`sample code of xml:`") | |
| code = ''' | |
| <bookstore> | |
| <book> | |
| <title>Learning XML</title> | |
| <author>John Doe</author> | |
| <price>29.99</price> | |
| </book> | |
| <book> | |
| <title>Advanced XML</title> | |
| <author>Jane Smith</author> | |
| <price>39.99</price> | |
| </book> | |
| </bookstore> | |
| ''' | |
| st.code(code,language="xml") | |
| code = ''' | |
| import pandas as pd | |
| pd.read_xml(r"your_file_path.xml") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("`we can choose which data we want by using xpath`") | |
| code = ''' | |
| pd.read_xml(r"your_file_path.xml",xpath = "person") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("`we can convert a data frame into xml file as follows`") | |
| code = ''' | |
| data.to_xml(r"your_file_path.xml") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":blue[1.Complex Nested Structure:]") | |
| st.markdown("`Issue:` XML files often have deeply nested hierarchies, making it difficult to extract specific elements or attributes.") | |
| st.markdown("`Impact:` Parsing and flattening the data into a tabular format (for machine learning) becomes time-consuming.") | |
| st.markdown(":blue[2. Large File Size:]") | |
| st.markdown("`Issue:` Large XML files can consume significant memory and CPU resources during parsing.") | |
| st.markdown("`Impact:` Slows down the pipeline or causes memory errors.") | |
| st.markdown(":blue[3. Irregular or Missing Tags:]") | |
| st.markdown("`Issue:` Some tags might be missing or inconsistent across records, causing errors during parsing.") | |
| st.markdown("`Impact:` Leads to incomplete or inconsistent data for machine learning models.") | |
| st.markdown(":blue[4. Data Type Inconsistencies:]") | |
| st.markdown("`Issue:` XML stores all data as strings by default, requiring manual conversion to numerical or categorical formats.") | |
| st.markdown("`Impact:` Data preprocessing becomes more complex.") | |
| st.markdown(":blue[5. Encoding Issues:]") | |
| st.markdown("`Issue:` XML files may use non-standard character encodings.") | |
| st.markdown("`Impact:` Leads to parsing errors or unreadable text.") | |
| st.markdown(":blue[7. Parsing Overhead]") | |
| st.markdown("`Issue:` XML parsers like ElementTree or minidom may be inefficient for large files.") | |
| st.markdown("`Impact:` Parsing large XML data can slow down the machine learning pipeline.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":rainbow[Csv]π"): | |
| st.subheader(":rainbow-background[CSV]") | |
| st.write("CSV (Comma-Separated Values) is a simple file format used to store tabular data, such as a spreadsheet or database. Each line in a CSV file represents a row in the table, and each value is separated by a comma.") | |
| st.markdown("β`.csv` is the extension of csv file") | |
| st.subheader("Handling of csv files") | |
| st.markdown("`How to read csv files?`") | |
| code = ''' | |
| import pandas as pd | |
| pd.read_csv(r"your_file_path.txt") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**`Issues that encountered while handling csv files`:**") | |
| st.markdown("**1.Parser error**") | |
| st.markdown("`overcoming parser error:`") | |
| code = ''' | |
| pd.read_csv(r"your_file_path.txt",on_bad_lines = "skip") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("This will gives output by skipping badlines or extra data points.") | |
| code = ''' | |
| pd.read_csv(r"your_file_path.txt",on_bad_lines = "warn") | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("This will warns and skips the extra data point or badlines.") | |
| st.markdown("**2.Encoding error**") | |
| st.markdown("`Overcoming encoding error:`") | |
| code = ''' | |
| import encodings | |
| l = encodings.aliases.aliases.keys() | |
| for y in l: | |
| try: | |
| pd.read_csv(r"your_file_path.csv",encoding=y) | |
| print("{} is correct encoding".format(y)) | |
| except UnicodeDecodeError: | |
| print("{} is not correct encoding".format(y)) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("This will give that encodings which are correct and not correct.") | |
| st.markdown("`LookupError`") | |
| code = ''' | |
| l1 = [] | |
| for y in l: | |
| try: | |
| pd.read_csv(r"your_file_path.csv",encoding=y,on_bad_lines="skip") | |
| print("{} is correct encoding".format(y)) | |
| except UnicodeDecodeError: | |
| print("{} is not correct encoding".format(y)) | |
| except LookupError: | |
| print("{} not supported".format(y)) | |
| l1.append(y) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("This will give which are correct,not correct and not supported encodings.") | |
| st.markdown("**3.Out Of Memory**") | |
| st.markdown("`Overcoming of this error`") | |
| code = ''' | |
| files = pd.read_csv(r"your_file_path.csv",encoding = "latin",chunksize=100) | |
| c = 0 | |
| for chunk in files: | |
| print(chunk.shape) | |
| print("*"*50) | |
| c+=1 | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("**4.Takes long time to load a huge data set**") | |
| st.markdown("`we can overcome this issue by using polars,this is same as pandas`") | |
| st.markdown("`It is way more faster than pandas`") | |
| code = ''' | |
| import polars | |
| x = time.time() | |
| data2 = polars.read_csv(r"your_file_path.csv") | |
| y = time.time() | |
| ''' | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":rainbow[Json]π"): | |
| st.subheader(":rainbow-background[Json]") | |
| st.markdown("JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate.") | |
| st.markdown("β`.json` is the extension of json files") | |
| st.subheader("Key Features of JSON:") | |
| st.markdown("`Text-Based Format:`JSON is text-based, making it simple and platform-independent.") | |
| st.markdown("`Structure:`It organizes data into key-value pairs or as an ordered list of values.") | |
| st.markdown("`Human-Readable:`JSON uses a clear and straightforward syntax.") | |
| st.markdown("`Language-Independent:`While its syntax is inspired by JavaScript, JSON can be used with most programming languages (e.g., Python, Java, C#, PHP, etc...)") | |
| st.markdown("`Widely Supported:`JSON is widely supported in APIs, databases, and configurations.") | |
| st.markdown("`simple json file format`") | |
| st.markdown("`JSON Syntax Basics:`") | |
| st.markdown(":green[1.Objects]:") | |
| st.markdown("βDefined using curly braces {}.") | |
| st.markdown("βContains key-value pairs, where keys are strings and values can be of various types.") | |
| st.markdown(":green[Example]:") | |
| code = ''' | |
| { | |
| "name": "John", | |
| "age": 30, | |
| "isEmployee": true | |
| } | |
| ''' | |
| st.code(code,language="json") | |
| st.markdown(":green[2.Arrays]:") | |
| st.markdown("βDefined using square brackets [].") | |
| st.markdown("βContains a list of values.") | |
| code = ''' | |
| [ | |
| "Apple", | |
| "Banana", | |
| "Cherry" | |
| ] | |
| ''' | |
| st.code(code,language="json") | |
| st.markdown(":green[3.Values]:") | |
| st.markdown("`Can be one of the following types:`") | |
| st.markdown("βString e.g: `hello`") | |
| st.markdown("βNumber e.g: `42`") | |
| st.markdown("βBoolean e.g: `true or false`") | |
| st.markdown("βNull `null`") | |
| st.markdown("βArray e.g: `[1, 2, 3]`") | |
| st.markdown("βObject e.g:") | |
| code = ''' | |
| data = {"key": "value"} | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[3.Keys]:") | |
| st.markdown("βMust be strings enclosed in double quotes.") | |
| st.markdown("βShould be unique within an object.") | |
| st.markdown(":green-background[Semi-Structured JSON Format]") | |
| st.markdown(":green-background[max_level:] generally refers to a parameter or limit that controls the depth of recursion or nesting in the JSON data when it is being processed, analyzed, or flattened.") | |
| code = ''' | |
| { | |
| "level1": { | |
| "level2": { | |
| "level3": { | |
| "key": "value" | |
| } | |
| } | |
| } | |
| } | |
| ''' | |
| st.code(code,language="json") | |
| st.markdown("βͺ If max_level = 1, only the top-level key (level1) will be processed.") | |
| st.markdown("βͺ If max_level = 2, the second-level keys (level2) will be included.") | |
| st.markdown("βͺ If max_level = 3, it processes down to level3 and extracts the key.") | |
| st.markdown(":green-background[json_normalize():] json_normalize is a powerful tool for flattening semi-structured JSON data into a more structured tabular form. This is particularly useful in preprocessing JSON data for further analysis or storage. It simplifies accessing nested fields, making it easy to work with complex JSON.") | |
| st.markdown(":green-background[Key Arguments of json_normalize]") | |
| st.markdown(":green-background[data:] Input JSON data (list or dictionary).") | |
| st.markdown(":green-background[record_path:] Path to nested lists to flatten.") | |
| st.markdown(":green-background[meta:] Keys to include as columns (useful for hierarchical data).") | |
| st.markdown(":green-background[Semi-structure:] Refers to data that includes lists of dictionaries, which can be converted into a dataframe with multiple columns, each dictionary acting as a single block.") | |
| st.markdown("`Example:` JSON in a Real-World Scenario") | |
| st.markdown("`API Response Example:`") | |
| code = ''' | |
| { | |
| "user": { | |
| "id": 1, | |
| "username": "john_doe", | |
| "email": "john@example.com" | |
| }, | |
| "posts": [ | |
| { | |
| "id": 101, | |
| "title": "JSON Basics", | |
| "published": "true" | |
| }, | |
| { | |
| "id": 102, | |
| "title": "Advanced JSON", | |
| "published": "false" | |
| } | |
| ] | |
| } | |
| ''' | |
| st.code(code,language="json") | |
| st.code("`In this example:`") | |
| st.markdown("βThe user field is an object.") | |
| st.markdown("βThe posts field is an array of objects.") | |
| st.subheader("Handling of json files") | |
| st.markdown("`How to read json files?`") | |
| st.markdown("`Syntax for reading json files`") | |
| code = ''' | |
| pd.read_json(path,string of dictionary) | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown(":green[Data collection:]") | |
| st.markdown("`Data can be collected through api's because api's will have plenty of data,because it is huge data base.api's works based on client-server relationship.`") | |
| st.markdown("βIf user wants some data then user need to give api key to the client,it will send request to the server.") | |
| st.markdown("βIf the request is approved by server it will sends the response to the client,and client will response to the user.") | |
| st.markdown("βwe usually uses `rapidapi`,from this we can collect the data") | |
| st.markdown("`here is sample code:`") | |
| code = ''' | |
| import requests | |
| url = "https://cricbuzz-cricket.p.rapidapi.com/stats/v1/player/8733/batting" | |
| headers = {"x-rapidapi-key": "your rapidAPI key","x-rapidapi-host": "cricbuzz-cricket.p.rapidapi.com" | |
| } | |
| response = requests.get(url, headers=headers) | |
| response | |
| print(response.json()) | |
| response.json()["headers"] | |
| pd.json_normalize(response.json()) | |
| pd.json_normalize(response.json(),meta = ["headers"]) | |
| pd.json_normalize(response.json(),record_path = ["values"]) | |
| ''' | |
| st.code(code,language="python") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button(":rainbow[Html]π"): | |
| st.subheader(":rainbow-background[Html]") | |
| st.markdown("HTML (HyperText Markup Language) is the standard language used to create and design web pages. It provides the structure and layout for web content, allowing you to define elements such as headings, paragraphs, links, images, and more.") | |
| st.subheader("Handling of html files") | |
| st.markdown("`How to read html files?`") | |
| code = ''' | |
| data = pd.read_html("https://en.wikipedia.org/wiki/Indian_Premier_League") | |
| ''' | |
| st.code(code,language="python") | |
| code = ''' | |
| data = pd.read_html("https://en.wikipedia.org/wiki/Indian_Premier_League",match = "Kochi Cricket") | |
| data | |
| ''' | |
| st.code(code,language="python") | |
| st.markdown("The `match` parameter is set to 'Specific Text', which filters the tables to include only those that contain the specified text.") | |
| st.markdown(":green[Common issues occur while handling html files]:") | |
| st.markdown(":green-background[Complex HTML Structures:] HTML files can have complex and nested structures, making it difficult to extract relevant data. Using libraries like BeautifulSoup in Python can help parse and navigate these structures.") | |
| st.markdown(":green-background[Inconsistent Data Formats:]HTML files may contain data in various formats, making it challenging to standardize the data for machine learning models. Data cleaning and preprocessing steps are crucial to handle these inconsistencies.") | |
| st.markdown(":green-background[Missing or Incomplete Data:]HTML files might have missing or incomplete data, which can affect the performance of machine learning models. Techniques like imputation or using default values can help address this issue.") | |
| st.markdown(":green-background[Dynamic Content:]Some HTML files contain dynamic content generated by JavaScript, which may not be present in the static HTML source. Using tools like Selenium can help render and extract dynamic content.") | |
| st.markdown(":green-background[Large File Sizes:]HTML files can be large, making it time-consuming to read and process them. Loading data in chunks or using efficient libraries like pandas can help manage large files.") | |
| st.markdown(":green-background[Encoding Issues:]HTML files may have different encodings, leading to errors when reading the file. Specifying the correct encoding can help avoid these issues.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Cleaning Up the Data π§Ή"): | |
| st.write("Data can sometimes be messy or incomplete, just like ingredients might have dirt or imperfections. We need to clean the data before using it in the model, just like you clean veggies before cooking.", | |
| "For example, some houses in our data might not have a price listed, or there might be incorrect sizes. We can fix this by filling in missing values or removing bad data.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Exploratory Data Analysis(EDA) π"): | |
| st.write("Now that we have clean data, we start looking for patterns or trends. It's like looking at how different ingredients work together in a recipe. We can use graphs to visualize these patterns.", | |
| "In our house data, we might look at a graph to see if bigger houses tend to cost more. This helps us understand the relationships between data points.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Feature Engineering βοΈ"): | |
| st.write("At this stage, we can create new information from the existing data, just like adding a twist to your recipe to make it unique. We might adjust or combine some features to improve the model.", | |
| "For example, we might create a new feature that shows whether a house is 'new' or 'old' based on its age, or we might change house size into 'small', 'medium', or 'large' categories.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Choosing the Right Model π€"): | |
| st.write("Now we need to choose the right model, or 'recipe', that will help us solve the problem. Different problems need different models, just like different dishes need different cooking methods.", | |
| "For predicting house prices, we might use a model called 'Linear Regression', which works well when there's a clear relationship between things like house size and price.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Training the Model ποΈββοΈ"): | |
| st.write("Once we have the model, we teach it by showing it examples from our data. Itβs like practicing a recipe to get better at it. The more the model practices, the more it learns.", | |
| "In this step, we show the model many examples of houses and their prices. The model learns how size, location, and age affect the price, and it adjusts itself to get better predictions.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Testing the model π"): | |
| st.write("After training, we need to check if the model is performing well. Itβs like tasting the dish to see if itβs coming out as expected. We use special tools to measure how well the model is predicting.", | |
| "For our house price model, we might use a method called 'Mean Squared Error' to see how close the modelβs predicted prices are to the actual prices. A lower error means the model is doing well.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Deployment π"): | |
| st.write("When our model is performing well, it's time to make it available for others to use. This is called deployment, and it's like serving the dish at a restaurant. We make the model accessible so anyone can use it to make predictions.", | |
| "For example, we could create a website where people can input details about a house (like its size and location) and get an estimate of its price.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| if st.button("Monitoring π"): | |
| st.write("Even after the model is deployed, we need to keep checking how itβs doing. Just like a dish might need adjustments, a model might need updates if things change over time.", | |
| "For example, if the housing market changes, like prices going up or down, we might need to retrain our model with new data to keep it accurate.") | |
| show_back_button(lambda: st.session_state.current_page == "main") | |
| st.write("### Summary") | |
| st.write(""" | |
| Building a machine learning project is like following a recipe: You need to know what you're making, gather the right ingredients, clean them up, cook (train) your dish, and serve it to others. By following each step carefully, you'll end up with a model that can solve real-world problems! | |
| """) |